-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use different regexes in KeywordDetector to improve accuracy #86
Conversation
Turned on Keyword detector by default Down-graded version to '0.1.666' to test it on a few repos without causing havoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit
Just stating for memories sake, that we had a discussion that I'll paraphrase here, I'll maybe make a separate issue: Currently, the assumption is that "no secret is repeated within a single file" however, with this plugin it is more likely that this assumption is gonna break e.g. if you have 2 different assignments of We can either (a) incorporate line numbers in
The downsides of doing nothing, and merging the PR as is:
We agree changing that assumption is a larger task than what's at hand. |
Filter out $variables for PHP files Filter out `(|[` followed by `)|]` Add `not`, more empty quotes and `password` variable names to FALSE_POSITIVES
After merging in master
Trim uncovered code Change tox to ensure tests are covered 100%
Removed `token` as a keyword Made FOLLOWED_BY_EQUAL_SIGNS_RE require variable ends with keyword
"""Generates raw secrets by re-scanning the line, with the specified plugin""" | ||
for raw_secret in plugin.secret_generator(secret_line): | ||
yield raw_secret | ||
if isinstance(plugin, KeywordDetector): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt :/ about writing it like this, but didn't see a better way.
Just wanted to put neon lights on it for the review 😅
In keyword_test.py
Made quotes required in Python files/added regexes for this Added a Filetype Enum and `determine_file_type` function Replaced 'pass' with 'db_pass' in BLACKLIST Added 'aws_secret_access_key' to BLACKLIST Added some trailing char cases to FALSE_POSITIVES :boom: Changed secret_type to 'Secret Keyword'
ea99830
to
e01d818
Compare
ce5862b
to
ec1e0cd
Compare
By adding an optional `((\'|")])?` to the regexes This is to catch 'foo' in e.g. `some_dict["secret"] = "foo"`
ec1e0cd
to
b7e48ab
Compare
76ddcdf
to
a37a9c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM based on internal testing. There are some remaining false positives but it should be okay.
Added Javascript specific false-positive checks Added ${ before } heuristic for e.g. ${link} Added more false-positives to FALSE_POSITIVES :zap: keyword_test.py Make STANDARD_NEGATIVES list and STANDARD_POSITIVES set for DRYness
7dd4926
to
a29108b
Compare
No description provided.